Predicting the Internet’s Evolution with Decision Trees and Lasso Logistic Regression Models
نویسنده
چکیده
The Internet self-evolves rapidly and its dynamic structure poses many interesting questions for researchers in network analysis. In this paper I show how we can simplify the entire Internet as a mathematical graph and then extract its structural characteristics; these characteristics in turn help us build statistical models that can predict how the Internet will evolve. The data describing the Internet structure are both clustered and unbalanced. I hence test various models, including lasso logistic regression, gradient-boosted decision trees and random forest decision trees, to see how well they cope with unbalanced and clustered data. The best performing model was created through a gradient-boosted decision tree that balances flexibility in fitting with robustness in prediction. I show that we can achieve good predicting power using fairly simple explanatory variables, but I also discuss how we can extract more sophisticated variables to improve the models’ performance.
منابع مشابه
Factors Influencing Drug Injection History among Prisoners: A Comparison between Classification and Regression Trees and Logistic Regression Analysis
Background: Due to the importance of medical studies, researchers of this field should be familiar with various types of statistical analyses to select the most appropriate method based on the characteristics of their data sets. Classification and regression trees (CARTs) can be as complementary to regression models. We compared the performance of a logistic regression model and a CART in predi...
متن کاملPenalized Lasso Methods in Health Data: application to trauma and influenza data of Kerman
Background: Two main issues that challenge model building are number of Events Per Variable and multicollinearity among exploratory variables. Our aim is to review statistical methods that tackle these issues with emphasize on penalized Lasso regression model. The present study aimed to explain problems of traditional regressions due to small sample size and m...
متن کاملRanking stocks of listed companies on Tehran stock exchange using a hybrid model of decision tree and logistic regression
Much research has introduced linear or nonlinear models using statistical models and machine learning tools in artificial intelligence to estimate Iran's rate of return. The primary purpose of these methods is simultaneously use different independent variables to improve stock return rates' modeling. However, in predicting the rate of return, in addition to the modeling method, the degree of co...
متن کاملPredicting The Type of Malaria Using Classification and Regression Decision Trees
Predicting The Type of Malaria Using Classification and Regression Decision Trees Maryam Ashoori1 *, Fatemeh Hamzavi2 1School of Technical and Engineering, Higher Educational Complex of Saravan, Saravan, Iran 2School of Agriculture, Higher Educational Complex of Saravan, Saravan, Iran Abstract Background: Malaria is an infectious disease infecting 200 - 300 million people annually. Environme...
متن کاملLarge Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal...
متن کامل